Which demographic characteristics explain differences in average state wages?
This study examines the relationship between state level demographics, labor force participation, and economic outcomes using pooled data from the American Community Survey (ACS). States differ in their population makeup demographically, employment status, and wage amount. Understanding how these factors interact provides important insight into broader economic performance. My primary goal is to examine how demographic and labor market characteristics such as gender, race, age, citizenship, and employment status relate to average hourly wages and household income at the state level. By studying these relationships across states and years, this study explains which population characteristics are most strongly associated with economic success and whether certain demographic profiles are linked to stronger labor market outcomes.
This research has meaningful implications for labor economics and public policy. State governments allocate resources for workforce development, education, job training, and social assistance programs. Understanding which demographic or labor-force factors correlate with stronger wage outcomes can help policymakers target where to allocate funding more effectively. Additionally, examining state-level demographic differences allows further examination into economic inequality, as differences in wage levels across states often translate into differences in opportunities and living standards.
My dataset contains state-level statistics from 2000 to 2023, created by computing weighted averages for each state-year observation. Here is the link to where I obtained the data from the ACS: LINK. After cleaning the dataset for wage and income values, recoding demographic variables, and constructing an employment status grouping (employed, unemployed, and not in the labor force), I generated state level averages including hourly wage, median wage, average age, household income, gender composition, citizenship rates, racial distributions, and labor force participation rates. Each row in the dataset represents the economic and demographic profile of one state in one year. This data allows me to compare wage patterns across states, analyze how demographic differences correlate with economic outcomes, and observe how labor market indicators cahnge over time. Because the data spans across multiple decades and includes all U.S. states, I am able to run regression analysis to help identify relationships between demographics and wages.
How do demographic populations relate to hourly wages and household income?
How significantly does employment status affect hourly wages and household income?
A question about regional effects, splitting data into regions by fips id and looking into economic indiactors with map, maybe like: Do certain U.S. regions consistently have higher wages or household income than others?
States in the U.S. show large differences in wages, incomes, and labor market outcomes. At the same time, states differ in their demographic makeup and in how many residents are employed, unemployed, or out of the labor force. These differences raise important questions about what drives economic success at the state level.
Wages vary across states for several reasons. States with higher concentrations of productivity industries such as technology, finance, or infrastructure tend to pay higher wages than states that rely more on agriculture, manufacturing, or other low-wage services. Differences in education levels, cost of living, and state policies also contribute to wage gaps. Labor economics research shows that higher educated workforces generally earn more and experience stronger long-term wage growth.
Demographics also further shape state economies. Characteristics such as age, gender, race, and citizenship affect labor-force participation and income. For example, older populations often have higher income due to accumulated wealth and greater experience, while racial and citizenship disparities can reflect inequalities documented in labor-market studies. Because economic structure and population characteristics vary widely across states, analyzing how these factors relate to wages provides insights into regional inequality and the conditions that support stronger labor-market performance.
The distribution of hourly wages appears to be normally distributed, centered around $14-$16 dollars per hour. Most states had an average hourly wage of around $12-$20 as most observations fall in this range. There are also no extreme outliers or tails that suggests linear regression using hourly wages as the response variable is appropriate. The distribution of average household income appears to be right skewed, with the most observations being in the $60,000-$100,000 range. This skew shows that there are income inequalities across states. With this not being normally distributed, I will transform this model by logging this variable for regression analysis.
The scatterplot of average wage and employment rate shows a negative linear trend, which is surprising. This could indicate that states with higher employment rates could have more lower wage jobs and that employment rate may not be a strong predictor in the regression model.
Now looking into demographic relationships to wages and income, age is positively related to wages shown through the plot. This is due to older workers having more experience which causes higher earnings, following labor economics theory. Age could be a significant predictor in the regression models. When looking into wages and the white population in states, there was a slightly negative relationship. This could reflect different regional patterns as some predominately white states could have lower wages (such as southern states). The relationship between black populations and wages appears to have no trend in the plot. This shows that this demographic alone does not predict wages.
Over time average wages and average household income in states have increased. This is due to inflation and increased productivity following economic theory.
YEAR: calendar year (2000–2023)
STATEFIP: numeric state identifier
avg_hourly_wage: mean of individual hourly wages
median_wage: median hourly wage
avg_household_income: mean of household income
avg_age: mean age
pct_female: percent of respondents who are female
pct_citizen: percent who are U.S. citizens.
pct_white and pct_black: percents of population identified as white or Black
pct_employed, pct_unemployed, pct_not_in_labor_force: percents in each labor-force status category
sample_size: number of observations used in that state-year
Call:
lm(formula = avg_hourly_wage ~ pct_employed + pct_unemployed +
pct_female + pct_white + pct_black + avg_age + pct_citizen +
YEAR, data = state_summary)
Residuals:
Min 1Q Median 3Q Max
-4.0762 -0.8932 -0.0570 0.8605 5.3621
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.301e+02 1.881e+01 -38.815 < 2e-16 ***
pct_employed 1.139e+01 1.119e+00 10.181 < 2e-16 ***
pct_unemployed 2.227e+01 3.455e+00 6.445 1.66e-10 ***
pct_female -3.528e+01 7.021e+00 -5.026 5.77e-07 ***
pct_white -6.708e-01 4.283e-01 -1.566 0.118
pct_black 1.231e+00 6.274e-01 1.962 0.050 *
avg_age 1.979e-01 2.958e-02 6.689 3.42e-11 ***
pct_citizen -8.770e+00 6.898e-01 -12.714 < 2e-16 ***
YEAR 3.763e-01 8.609e-03 43.715 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.414 on 1215 degrees of freedom
(102 observations deleted due to missingness)
Multiple R-squared: 0.8059, Adjusted R-squared: 0.8046
F-statistic: 630.6 on 8 and 1215 DF, p-value: < 2.2e-16
Call:
lm(formula = log(avg_household_income) ~ pct_employed + pct_unemployed +
pct_female + pct_white + pct_black + avg_age + pct_citizen +
YEAR, data = state_summary)
Residuals:
Min 1Q Median 3Q Max
-0.288903 -0.071007 0.001349 0.067534 0.290359
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.237e+01 1.209e+00 -43.312 < 2e-16 ***
pct_employed 2.316e+00 7.431e-02 31.172 < 2e-16 ***
pct_unemployed 1.394e+00 2.351e-01 5.931 3.84e-09 ***
pct_female 8.608e-01 4.601e-01 1.871 0.0616 .
pct_white -1.180e-01 2.853e-02 -4.136 3.76e-05 ***
pct_black 3.298e-02 4.192e-02 0.787 0.4316
avg_age 1.477e-02 1.956e-03 7.551 8.04e-14 ***
pct_citizen -1.216e+00 4.583e-02 -26.533 < 2e-16 ***
YEAR 3.103e-02 5.521e-04 56.211 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09815 on 1317 degrees of freedom
Multiple R-squared: 0.8774, Adjusted R-squared: 0.8767
F-statistic: 1178 on 8 and 1317 DF, p-value: < 2.2e-16
Model 1
Model 1 evaluates how demographic and labor-force characteristics influence state average hourly wages. The employment rate has one of the largest positive effects: its coefficient of 11.39 implies that a 1 percentage point increase in the employment rate raises the average hourly wage by roughly 11.39 cents, holding the other variables constant. Average age is another powerful predictor, with a coefficient of 1.98. A one-year increase in a state’s average age corresponds to nearly a 19.79 cent increase in the average hourly wage. The percent female variable shows a significant negative relationship with wages as the coefficient of –3.53 indicates that a 1 percentage point increase in the female population is associated with a 35 cent decrease in hourly wages, suggesting that gender wage gaps could be influencing wages. The coefficient for citizenship percentage of –8.77 means a 1 percentage point increase in the share of citizens corresponds to about 8.77 cent lower hourly wage, likely reflecting a regional pattern in which higher wage states tend to have more non-citizens. Racial composition contributes less explanatory power to the model as percent of whites is not statistically significant, and the black population is only marginally significant with a small positive estimate. Finally, the year coefficient of ndicates that wages increase by roughly 3 to 4 cents per year on average, capturing the effect of wage growth due to inflation. With an R² of 0.8059, the model explains most of the variation in wages, showing that labor-market strength, age structure, gender composition, and time trends are the most influential predictors.
Model 2
Model 2 uses the natural log of average household income as the response variable, meaning coefficients represent approximate percentage changes in income. This model performs exceptionally well, explaining nearly 88% of income variation across state-years, which is greater than the R^2 value from model 1. The employment rate again stands out as a key predictor with a coefficient of 2.316, as a 1 percentage point increase in the employment rate is associated with about a 2.3% increase in household income. Average age is also strongly positive, with a coefficient of 0.015, indicating that each additional year of average age corresponds to roughly a 1.5% increase in household income. Percent White shows a significantly negative effect as its coefficient of –0.118 implies that a 1 percentage point increase in the white population reduces household income by about 0.12%, showing that higher income states have more diverse populations. In contrast, percent black is not statistically significant. Percent female is weakly positive in this model as its coefficient of 0.016 suggests a 1 percentage point increase in the female share raises household income by about 0.16%. Citizenship rate is again strongly negative with a coefficient of –0.121; a 1 percentage point increase in the citizen share predicts a 0.12% decrease in household income, which is consistent with the pattern that states with more non-citizens tend to have higher incomes from model 1. Finally, the year coefficient of 0.103 indicates annual income growth of roughly 10% per year, capturing increases in household income over time due to inflation, productivity growth, and rising inequality. Overall, Model 2 shows that employment, age structure, and time trends are the most powerful determinants of state-level household income, with demographic composition contributing additional but smaller effects.
Model 1
The diagnostic plots for Model 1 show that the regression fits reasonably well, though some assumptions are only partially met. The Residuals vs. Fitted plot displays a slight curved pattern, suggesting some non-linearity in the relationship between the predictors and hourly wages. The Scale–Location plot also indicates a small increase in residual variance at higher fitted values, pointing to minor heteroskedasticity. The Q–Q plot shows that most residuals follow a normal distribution, with only slight deviation in the upper tail. Finally, the Residuals vs. Leverage plot reveals no influential outliers, as no points approach high Cook’s distance values. Overall, Model 1 performs adequately, but the diagnostics suggest small departures from linearity and constant variance that should be noted when interpreting results.
Model 2
The diagnostic plots for Model 2 indicate that the log transformation of household income substantially improved the model. The Residuals vs. Fitted plot shows no meaningful curvature or pattern, suggesting that the condition of linearity is met. The Scale–Location plot demonstrates consistent variance across fitted values, indicating strong homoscedasticity, while the Q–Q plot shows residuals closely following the normal line with no deviation at the tails. The Residuals vs. Leverage plot also indicates that there are no outliers or influential points past the Cook’s distance. Together, these diagnostics show that Model 2 meets regression assumptions much more strongly than Model 1 and is a good predictor of economic outcome, in this case being average household incomes by state.
---
title: "To be determined"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: default
navbar-bg: "darkblue"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{=html}
<head>
<base target="_blank">
</head>
```
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
library(pacman)
library(DataExplorer)
library(car)
pacman::p_load(data.table, tidyverse)
# Use the fread function in the package data.table for a large dataset
df <- fread("~/Documents/mth369/cps_00004.csv")
glimpse(df)
# Clean data
df_clean <- df %>%
mutate(
HOURWAGE = ifelse(HOURWAGE >= 999, NA, HOURWAGE),
HHINCOME = ifelse(HHINCOME >= 9999999, NA, HHINCOME),
AGE = ifelse(AGE > 90, NA, AGE),
SEX = factor(SEX, levels = c(1, 2), labels = c("Male", "Female")),
RACE = factor(RACE),
CITIZEN = factor(CITIZEN),
EMPSTAT = factor(EMPSTAT)
)
df_clean <- df_clean %>%
mutate(
emp_group = case_when(
EMPSTAT %in% c(10, 12) ~ "employed",
EMPSTAT %in% c(21, 22) ~ "unemployed",
EMPSTAT %in% c(32, 34, 36) ~ "not_in_labor_force",
TRUE ~ NA_character_
)
)
# Aggregate by state-year (using person weight ASECWT)
state_summary <- df_clean %>%
group_by(YEAR, STATEFIP) %>%
summarize(
avg_hourly_wage = weighted.mean(HOURWAGE, ASECWT, na.rm = TRUE),
median_wage = median(HOURWAGE, na.rm = TRUE),
avg_household_income = weighted.mean(HHINCOME, ASECWT, na.rm = TRUE),
avg_age = weighted.mean(AGE, ASECWT, na.rm = TRUE),
pct_female = weighted.mean(SEX == "Female", ASECWT, na.rm = TRUE),
pct_citizen = weighted.mean(CITIZEN == 1, ASECWT, na.rm = TRUE),
pct_white = weighted.mean(RACE == 100, ASECWT, na.rm = TRUE),
pct_black = weighted.mean(RACE == 200, ASECWT, na.rm = TRUE),
pct_employed = weighted.mean(emp_group == "employed", ASECWT, na.rm = TRUE),
pct_unemployed = weighted.mean(emp_group == "unemployed", ASECWT, na.rm = TRUE),
pct_not_in_labor_force = weighted.mean(emp_group == "not_in_labor_force", ASECWT, na.rm = TRUE),
sample_size = n() # optional: count of people in that state-year
) %>%
ungroup()
glimpse(state_summary)
```
Introduction
===
Column {data-width=650}
---
### Information on my Project
**Which demographic characteristics explain differences in average state wages?**
This study examines the relationship between state level demographics, labor force participation, and economic outcomes using pooled data from the American Community Survey (ACS). States differ in their population makeup demographically, employment status, and wage amount. Understanding how these factors interact provides important insight into broader economic performance. My primary goal is to examine how demographic and labor market characteristics such as gender, race, age, citizenship, and employment status relate to average hourly wages and household income at the state level. By studying these relationships across states and years, this study explains which population characteristics are most strongly associated with economic success and whether certain demographic profiles are linked to stronger labor market outcomes.
This research has meaningful implications for labor economics and public policy. State governments allocate resources for workforce development, education, job training, and social assistance programs. Understanding which demographic or labor-force factors correlate with stronger wage outcomes can help policymakers target where to allocate funding more effectively. Additionally, examining state-level demographic differences allows further examination into economic inequality, as differences in wage levels across states often translate into differences in opportunities and living standards.
My dataset contains state-level statistics from 2000 to 2023, created by computing weighted averages for each state-year observation. Here is the link to where I obtained the data from the ACS: [LINK](https://usa.ipums.org/usa/). After cleaning the dataset for wage and income values, recoding demographic variables, and constructing an employment status grouping (employed, unemployed, and not in the labor force), I generated state level averages including hourly wage, median wage, average age, household income, gender composition, citizenship rates, racial distributions, and labor force participation rates. Each row in the dataset represents the economic and demographic profile of one state in one year. This data allows me to compare wage patterns across states, analyze how demographic differences correlate with economic outcomes, and observe how labor market indicators cahnge over time. Because the data spans across multiple decades and includes all U.S. states, I am able to run regression analysis to help identify relationships between demographics and wages.
### Research Questions
1. How do demographic populations relate to hourly wages and household income?
2. How significantly does employment status affect hourly wages and household income?
3. A question about regional effects, splitting data into regions by fips id and looking into economic indiactors with map, maybe like: Do certain
U.S. regions consistently have higher wages or household income than others?
Column {data-width=350}
---
### Background and Significance
States in the U.S. show large differences in wages, incomes, and labor market outcomes. At the same time, states differ in their demographic makeup and in how many residents are employed, unemployed, or out of the labor force. These differences raise important questions about what drives economic success at the state level.
Wages vary across states for several reasons. States with higher concentrations of productivity industries such as technology, finance, or infrastructure tend to pay higher wages than states that rely more on agriculture, manufacturing, or other low-wage services. Differences in education levels, cost of living, and state policies also contribute to wage gaps. Labor economics research shows that higher educated workforces generally earn more and experience stronger long-term wage growth.
Demographics also further shape state economies. Characteristics such as age, gender, race, and citizenship affect labor-force participation and income. For example, older populations often have higher income due to accumulated wealth and greater experience, while racial and citizenship disparities can reflect inequalities documented in labor-market studies. Because economic structure and population characteristics vary widely across states, analyzing how these factors relate to wages provides insights into regional inequality and the conditions that support stronger labor-market performance.
Data
===
```{r}
DT::datatable(state_summary, rownames = FALSE, options = list(
columnDefs = list(list(className = 'dt-center',
targets = 1:5)), pageLength = 10))
```
EDA
===
Column {.tabset data-width=700}
---
### Hourly Wage Distribution
```{r}
ggplot(state_summary, aes(x = avg_hourly_wage)) +
geom_histogram(binwidth = 0.5, color = "black", fill = "blue") +
labs(title = "Distribution of Average Hourly Wages",
x = "Average Hourly Wage", y = "Count")
```
### Household Income Distribution
```{r}
ggplot(state_summary, aes(x = avg_household_income)) +
geom_histogram(binwidth = 5000, color = "black", fill = "blue") +
labs(title = "Distribution of Average Household Income",
x = "Average Household Income", y = "Count")
```
### Average Wage and Employment Rate
```{r}
ggplot(state_summary, aes(x = pct_employed, y = avg_hourly_wage)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Employment Rate and Average Wages",
x = "Percent Employed", y = "Average Hourly Wage")
```
### Age vs. Wages
```{r}
ggplot(state_summary, aes(x = avg_age, y = avg_hourly_wage)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Relationship Between Average Age and Wages",
x = "Average Age", y = "Average Hourly Wage")
```
### White Race % Vs. Wages
```{r}
ggplot(state_summary, aes(x = pct_white, y = avg_hourly_wage)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Percent White and Wages",
x = "Percent White", y = "Average Hourly Wage")
```
### Black Race % Vs. Wages
```{r}
ggplot(state_summary, aes(x = pct_black, y = avg_hourly_wage)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Percent Black and Wages",
x = "Percent Black", y = "Average Hourly Wage")
```
### Average Wages Over Time
```{r}
state_summary %>%
group_by(YEAR) %>%
summarize(mean_wage = mean(avg_hourly_wage, na.rm = TRUE)) %>%
ggplot(aes(x = YEAR, y = mean_wage)) +
geom_line() +
labs(title = "Average State Hourly Wages Over Time",
x = "Year", y = "Average hourly wage")
```
### Average Household Income Over Time
```{r}
state_summary %>%
group_by(YEAR) %>%
summarize(mean_income = mean(avg_household_income, na.rm = TRUE)) %>%
ggplot(aes(x = YEAR, y = mean_income)) +
geom_line(color = "blue", size = 1) +
labs(title = "Average Household Income Over Time",
x = "Year",
y = "Average Household Income")
```
Column {.tabset data-width=300}
---
### Analysis
The distribution of hourly wages appears to be normally distributed, centered around $14-$16 dollars per hour. Most states had an average hourly wage of around $12-$20 as most observations fall in this range. There are also no extreme outliers or tails that suggests linear regression using hourly wages as the response variable is appropriate. The distribution of average household income appears to be right skewed, with the most observations being in the $60,000-$100,000 range. This skew shows that there are income inequalities across states. With this not being normally distributed, I will transform this model by logging this variable for regression analysis.
The scatterplot of average wage and employment rate shows a negative linear trend, which is surprising. This could indicate that states with higher employment rates could have more lower wage jobs and that employment rate may not be a strong predictor in the regression model.
Now looking into demographic relationships to wages and income, age is positively related to wages shown through the plot. This is due to older workers having more experience which causes higher earnings, following labor economics theory. Age could be a significant predictor in the regression models. When looking into wages and the white population in states, there was a slightly negative relationship. This could reflect different regional patterns as some predominately white states could have lower wages (such as southern states). The relationship between black populations and wages appears to have no trend in the plot. This shows that this demographic alone does not predict wages.
Over time average wages and average household income in states have increased. This is due to inflation and increased productivity following economic theory.
### Variable Information
YEAR: calendar year (2000–2023)
STATEFIP: numeric state identifier
avg_hourly_wage: mean of individual hourly wages
median_wage: median hourly wage
avg_household_income: mean of household income
avg_age: mean age
pct_female: percent of respondents who are female
pct_citizen: percent who are U.S. citizens.
pct_white and pct_black: percents of population identified as white or Black
pct_employed, pct_unemployed, pct_not_in_labor_force: percents in each labor-force status category
sample_size: number of observations used in that state-year
Regression Analysis
===
Column {.tabset data-width=600}
---
### Model 1 (Explaining Hourly Wages)
```{r}
# For Model 1, looking into what determines average hourly wage
model1 <- lm(avg_hourly_wage ~ pct_employed + pct_unemployed + pct_female + pct_white + pct_black + avg_age + pct_citizen + YEAR, data = state_summary)
summary(model1)
```
### Model 2 Results (Explaining Household Income)
```{r}
# Model 2, looking into what determines average household income
model2 <- lm(log(avg_household_income) ~ pct_employed + pct_unemployed +
pct_female + pct_white + pct_black + avg_age + pct_citizen + YEAR, data = state_summary)
summary(model2)
```
Column {data-width=400}
---
### Analysis
***Model 1***
Model 1 evaluates how demographic and labor-force characteristics influence state average hourly wages. The employment rate has one of the largest positive effects: its coefficient of 11.39 implies that a 1 percentage point increase in the employment rate raises the average hourly wage by roughly 11.39 cents, holding the other variables constant. Average age is another powerful predictor, with a coefficient of 1.98. A one-year increase in a state’s average age corresponds to nearly a 19.79 cent increase in the average hourly wage. The percent female variable shows a significant negative relationship with wages as the coefficient of –3.53 indicates that a 1 percentage point increase in the female population is associated with a 35 cent decrease in hourly wages, suggesting that gender wage gaps could be influencing wages. The coefficient for citizenship percentage of –8.77 means a 1 percentage point increase in the share of citizens corresponds to about 8.77 cent lower hourly wage, likely reflecting a regional pattern in which higher wage states tend to have more non-citizens. Racial composition contributes less explanatory power to the model as percent of whites is not statistically significant, and the black population is only marginally significant with a small positive estimate. Finally, the year coefficient of ndicates that wages increase by roughly 3 to 4 cents per year on average, capturing the effect of wage growth due to inflation. With an R² of 0.8059, the model explains most of the variation in wages, showing that labor-market strength, age structure, gender composition, and time trends are the most influential predictors.
***Model 2***
Model 2 uses the natural log of average household income as the response variable, meaning coefficients represent approximate percentage changes in income. This model performs exceptionally well, explaining nearly 88% of income variation across state-years, which is greater than the R^2 value from model 1. The employment rate again stands out as a key predictor with a coefficient of 2.316, as a 1 percentage point increase in the employment rate is associated with about a 2.3% increase in household income. Average age is also strongly positive, with a coefficient of 0.015, indicating that each additional year of average age corresponds to roughly a 1.5% increase in household income. Percent White shows a significantly negative effect as its coefficient of –0.118 implies that a 1 percentage point increase in the white population reduces household income by about 0.12%, showing that higher income states have more diverse populations. In contrast, percent black is not statistically significant. Percent female is weakly positive in this model as its coefficient of 0.016 suggests a 1 percentage point increase in the female share raises household income by about 0.16%. Citizenship rate is again strongly negative with a coefficient of –0.121; a 1 percentage point increase in the citizen share predicts a 0.12% decrease in household income, which is consistent with the pattern that states with more non-citizens tend to have higher incomes from model 1. Finally, the year coefficient of 0.103 indicates annual income growth of roughly 10% per year, capturing increases in household income over time due to inflation, productivity growth, and rising inequality. Overall, Model 2 shows that employment, age structure, and time trends are the most powerful determinants of state-level household income, with demographic composition contributing additional but smaller effects.
Diagnostics
===
Column {.tabset data-width=600}
---
### Diagnostic Plots for Model 1
```{r}
par(mfrow = c(2, 2))
plot(model1)
```
### Diagnostic Plots for Model 2
```{r}
par(mfrow = c(2, 2))
plot(model2)
```
Column {data-width=400}
---
### Analysis
***Model 1***
The diagnostic plots for Model 1 show that the regression fits reasonably well, though some assumptions are only partially met. The Residuals vs. Fitted plot displays a slight curved pattern, suggesting some non-linearity in the relationship between the predictors and hourly wages. The Scale–Location plot also indicates a small increase in residual variance at higher fitted values, pointing to minor heteroskedasticity. The Q–Q plot shows that most residuals follow a normal distribution, with only slight deviation in the upper tail. Finally, the Residuals vs. Leverage plot reveals no influential outliers, as no points approach high Cook’s distance values. Overall, Model 1 performs adequately, but the diagnostics suggest small departures from linearity and constant variance that should be noted when interpreting results.
***Model 2***
The diagnostic plots for Model 2 indicate that the log transformation of household income substantially improved the model. The Residuals vs. Fitted plot shows no meaningful curvature or pattern, suggesting that the condition of linearity is met. The Scale–Location plot demonstrates consistent variance across fitted values, indicating strong homoscedasticity, while the Q–Q plot shows residuals closely following the normal line with no deviation at the tails. The Residuals vs. Leverage plot also indicates that there are no outliers or influential points past the Cook's distance. Together, these diagnostics show that Model 2 meets regression assumptions much more strongly than Model 1 and is a good predictor of economic outcome, in this case being average household incomes by state.
Map
===
Results
===
Column {data-width=650}
---
### Findings
Column {data-width=350}
---
### Limitations
### Resources
Author
===
### About Myself
My name is Mark Burns and I am a current Senior here at the University of Dayton from Cleveland, Ohio, majoring in Economics with minors in Data Analytics and Finance. I will be graduating in May of 2026.
I interned this past summer as a Business Analytics intern, focusing on creating budget reports using Excel and using PowerBI for further insight.
Please feel free to connect with me on LinkedIn!:
[Visit my LinkedIn profile](https://www.linkedin.com/in/markdburns2)